# Transformer Architecture

Wav2vec2 Base Librispeech Demo Colab
Apache-2.0
This model is a speech recognition model fine-tuned on the LibriSpeech dataset based on facebook/wav2vec2-base, achieving a word error rate of 0.3174 on the evaluation set.
Speech Recognition Transformers
W
vishwasgautam
14
0
Videomae Base Finetuned Ucf101 Subset
Video classification model fine-tuned on a subset of UCF101 based on the VideoMAE base model
Video Processing Transformers
V
cccchristopher
30
0
X2I
Apache-2.0
X2I is a multimodal diffusion Transformer model capable of converting various input modalities (text, images, videos, audio, speech) into image outputs.
Text-to-Image Other
X
OPPOer
435
7
Latex Finetuned
A Transformer-based optical character recognition model optimized for processing handwritten math images and structured math syntax.
Text Recognition Transformers
L
tjoab
109
1
Unixcoder Code Vulnerability Detector
A C/C++ code vulnerability detection model fine-tuned based on Microsoft's UniXcoder, with an accuracy of 68.34% and an F1 score of 62.14%.
Text Classification Transformers English
U
mahdin70
416
1
Digitaledutransformers
Gpl-3.0
A Transformer-based tabular classification model for financial data analysis
Text Classification Transformers
D
SnowFlash383935
149
1
Dna2vec
MIT
DNA sequence embedding model based on Transformer architecture, supporting sequence alignment and genomics applications
Molecular Model Transformers
D
roychowdhuryresearch
557
1
Finedefics
Finedefics is an open-source multimodal large language model (MLLM) that enhances fine-grained visual recognition (FGVR) capabilities by incorporating object attribute descriptions.
Image-to-Text
F
StevenHH2000
82
6
Terjman Large V2.0
Terjman Large-v2.0 is a Transformer-based English-Moroccan dialect translation model with significantly improved performance, comparable to commercial models.
Machine Translation Transformers Supports Multiple Languages
T
BounharAbdelaziz
20
1
Tabpfn Mix 1.0 Regressor
Apache-2.0
TabPFNMix is a tabular foundation model pretrained on purely synthetic datasets, utilizing an encoder-decoder Transformer architecture, suitable for tabular data regression tasks.
Materials Science
T
autogluon
3,474
13
Tabpfn Mix 1.0 Classifier
Apache-2.0
A foundational model for tabular data, pretrained on synthetic datasets generated by mixing random classifiers
Molecular Model
T
autogluon
19.77k
13
Rtdetr V2 R101vd
Apache-2.0
RT-DETRv2 is a real-time object detection model based on the Transformer architecture, enhanced by an improved baseline model and free optimization tricks.
Object Detection Transformers
R
apolloparty
25
0
Pixart Sigma Nitro
Apache-2.0
AMD Nitro Diffusion is a series of efficient text-to-image models, distilled from mainstream diffusion models on AMD Instinct™ GPUs. PixArt-Sigma Nitro is a high-resolution single-step inference model based on Transformer architecture.
Image Generation
P
amd
21
2
Trocr Base Handwritten Ru
The TrOCR model is a Transformer-based optical character recognition model, specifically fine-tuned for Russian handwritten text.
Image-to-Text Transformers Other
T
kazars24
1,843
9
Materials.selfies Ted
Apache-2.0
A Transformer-based encoder-decoder model specifically designed for molecular representation using SELFIES
Molecular Model Transformers
M
ibm-research
3,343
7
Speecht5 Fine Tune En
MIT
An English speech synthesis (TTS) model fine-tuned based on Microsoft's SpeechT5, specializing in voice generation for technical domain texts
Speech Synthesis Transformers English
S
Solo448
16
0
Lwm
LWM is the first foundational model in the field of wireless communications, developed as a universal feature extractor capable of extracting fine-grained representations from wireless channel data.
Physics Model Transformers
L
wi-lab
137
3
Pgtformer Base
PGTFormer is an image-to-image transformation model based on PyTorch, integrated and pushed to Hugging Face Hub via PytorchModelHubMixin.
Image Generation Safetensors
P
kepeng
151
4
Timesformer Base Finetuned K400
TimeSformer is a Transformer-based video understanding model, specifically fine-tuned on the Kinetics-400 dataset.
Video Processing Transformers
T
onnx-community
17
0
Segformer B2 Human
Other
A fashion image segmentation model based on the SegFormer architecture, specifically designed for fine segmentation of clothing and accessories
Image Segmentation Transformers
S
sayeed99
46
1
Trocr Math Handwritten
TrOCR is a Transformer-based OCR model specifically designed for recognizing handwritten mathematical formulas
Image-to-Text Transformers
T
fhswf
290
6
Sat 12l Sm
MIT
Advanced sentence segmentation model based on a 12-layer Transformer architecture, supporting multilingual text segmentation tasks
Sequence Labeling Transformers Supports Multiple Languages
S
segment-any-text
31.44k
20
Meshanything
MeshAnything is an artist-grade mesh generation model based on autoregressive Transformers, capable of converting images or point clouds into high-quality 3D mesh models.
3D Vision
M
Yiwen-ntu
193
14
Dab Detr Resnet 50
Apache-2.0
DAB-DETR is an improved DETR object detection model that significantly enhances training convergence speed and detection accuracy through dynamic anchor box query mechanism
Object Detection Transformers English
D
IDEA-Research
1,590
2
Block Diagram Global Information
A Transformer architecture model based on the Donut framework, designed to extract overall summary information from block diagram images, supporting English and Korean processing.
Image-to-Text Transformers Supports Multiple Languages
B
shreyanshu09
19
2
Rtdetr R18vd
Apache-2.0
RT-DETR is the first real-time end-to-end object detection Transformer model, achieving efficient NMS-free detection through a hybrid encoder and query selection mechanism
Object Detection Transformers English
R
PekingU
11.98k
4
MOMENT 1 Large
MIT
MOMENT is a series of general-purpose time series analysis foundation models that support multiple time series analysis tasks, offering out-of-the-box effectiveness and performance enhancement through fine-tuning.
Materials Science Transformers
M
AutonLab
194.93k
70
Berturk Legal
MIT
BERTurk-Legal is a Transformer-based language model specifically designed for prior case retrieval tasks in the Turkish legal domain.
Large Language Model Transformers Other
B
KocLab-Bilkent
382
6
Segformer B2 Fashion
Other
A fashion image segmentation model fine-tuned based on the SegFormer architecture, specifically designed for identifying and segmenting different apparel categories in clothing images
Image Segmentation Transformers
S
sayeed99
154
12
Vsft Llava 1.5 7b Hf Trl
A multimodal vision-language model based on LLaVA-1.5-7B trained through Visual Supervised Fine-Tuning (VSFT), supporting image understanding and dialogue generation
Image-to-Text Transformers English
V
HuggingFaceH4
65
14
Pix2text Table Rec
MIT
A table structure recognition model developed based on Microsoft's Table Transformer for table detection and recognition tasks in documents
Text Recognition Transformers
P
breezedeus
1,124
2
Model Timesformer Subset 02
A video understanding model based on the TimeSformer architecture, fine-tuned on an unknown dataset with an accuracy of 88.52%
Video Processing Transformers
M
namnh2002
15
0
Translate Ar En V1.0 Hplt
This is a Transformer-based machine translation model from Arabic to English, trained exclusively on HPLT data.
Machine Translation Transformers Supports Multiple Languages
T
HPLT
26
3
Trocr Large Spanish
MIT
Transformer-based OCR model for Spanish printed text, optimized for printed fonts and does not support handwriting recognition
Image-to-Text Transformers Supports Multiple Languages
T
qantev
298
11
Trocr Small Spanish
MIT
Spanish printed text OCR model optimized based on Transformer architecture, does not support handwriting recognition
Text Recognition Transformers Supports Multiple Languages
T
qantev
270
7
Table Transformer Structure Recognition V1.1 All
MIT
A Transformer-based model for table structure recognition, designed to detect table structures in documents
Text Recognition Transformers
T
microsoft
395.03k
70
Table Transformer Structure Recognition V1.1 Fin
MIT
A table structure recognition model based on the DETR architecture, specifically designed for detecting and analyzing table structures in documents.
Text Recognition Transformers
T
microsoft
575
1
Table Transformer Structure Recognition V1.1 Pub
MIT
A table transformer model trained on the PubTables1M dataset for table structure recognition in documents.
Text Recognition Transformers
T
microsoft
1,634
4
Table Transformer Detection Custom Ale
MIT
A table detection model based on DETR architecture, specifically designed to identify table regions in documents
Text Recognition Transformers
T
aParadigmP
44
0
Medical Summarization
Apache-2.0
A specialized variant based on the T5 Transformer architecture, fine-tuned specifically for medical text summarization tasks, capable of generating concise and coherent summaries for medical documents, research papers, clinical notes, and other healthcare-related texts.
Text Generation Transformers English
M
Falconsai
2,215
133
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase